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Abstract 

Graph theory is increasingly commonly utilised in genetics, proteomics and neuroimag- 
ing. In such fields, the data of interest generally constitute weighted graphs. Analysis of 
such weighted graphs often require the integration of topological metrics with respect to the 
density of the graph. Here, density refers to the proportion of the number of edges present in 
that graph. When topological metrics based on shortest paths are of interest, such density- 
integration usually necessitates the iterative application of Dijkstra's algorithm in order to 
compute the shortest path matrix at each density level. In this short note, we describe a 
recursive shortest path algorithm based on single edge updating, which replaces the need for 
the iterative use of Dijkstra's algorithm. Our proposed procedure is based on pairs of breadth- 
first searches around each of the vertices incident to the edge added at each recursion. An 
algorithmic analysis of the proposed technique is provided. When the graph of interest is 
coded as an adjacency list, our algorithm can be shown to be more efficient than an iterative 
use of Dijkstra's algorithm. 

Introduction 

The last ten years has seen a surge of interest in graph theory among biologists, physicists and 
other natural scientists. This was primarily stimulated by the seminal papers of Watts and Stro- 
gatz (1998) and Barabasi and Albert (1999). In particular, a wide range of different data types are 
now analyzed through systematic calculations of various topological measures, such as the charac- 
teristic path length or clustering coefficient. In systems biology and neuroscience, subject-specific 
networks can be constructed in order to compare several populations of networks for testing pu- 
tative differences between groups of subjects (see BuUmore and Sporns, 2009, for a review). (For 
convenience, the terms network and graph will here be used interchangeably, as this reflects some of 
the recent developments in the literature.) Such biological networks, however, tend to be weighted 
undirected graphs, which generally correspond to some standardized covariance matrices between 
a set of regions of interest. By contrast, most of the topological measures introduced by Watts 
and Strogatz (1998) and Barabasi and Albert (1999) pertain to unweighted networks. 

There is currently no general consensus on how to compute or compare the topology of weighted 
graphs. This is a particularly arduous problem, since it requires the use of real-valued mathemati- 
cal tools on objects, which are essentially discrete. One of the possible solutions to this conundrum 
has been advanced by He et al. (2009), who suggested integrating the topological measures of in- 
terest with respect to the density of the network (see also Achard and BuUmore, 2007, Ginestet 
and Simmons, 2011). The density of a network is here defined as the proportion of the number of 
edges in a given graph. Such integration, however, is computationally expensive, and its complex- 
ity grows quadratically with the number of nodes. A Monte Carlo scheme has been proposed in 
the literature to address this issue and approximate the value of such an integral (Ginestet et al., 
Submitted). Such Monte Carlo methods, however, also necessitates large number of simulations 
in order to reduce the variability of the resulting estimates. 

Most of the topological metrics of interest to researchers in neuroscience and systems biology 
tend to involve the computation of the matrix of shortest paths, denoted D. This includes, 
for instance, the global and local efficiency measures proposed by Latora and Marchiori (2001) 
(see also Latora and Marchiori, 2003). The computation of D for a given network can be done 
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efficiently using the celebrated Dijstra's algorithm (Dijkstra, 1959). However, when considering 
weighted networks, Dijstra's algorithm may need to be invoked as many times as the number of 
edges in the graph of interest. In this short note, we address this specific problem by proposing 
a recursive shortest path algorithm based on applying single edge updates to D. In this setup, 
we only work with the shortest path matrix and compute the value of the desired topological 
metric at every density level. Taken together, we therefore provide an efficient algorithm for the 
density-integration of the topological functions of weighted networks. 



Density-integration of Topological Metrics 

In this paper, our main focus will be on undirected weighted graphs, containing no graph loops 
or multiple edges. However, since we also need to refer to unweighted graphs, we introduce the 
following notation. A graph G is here defined as a triple (V,f,W), where V{G) is the standard 
vertex set, £{G) is the edge set and yV(G) is a multiset of real-valued weights. Our convention 
generalizes to directed graphs. In addition, this also includes undirected unweighted (simple) 
graphs as special cases, for which the elements of W belong to {0, 1}. Such a setup may, for 
instance, apply to the consideration of correlation matrices or other matrices of similarity measures 
with real-valued entries. In addition, we will make use of the following notation, 

Nv:=\V{G)\, NE:=\e{G)\, and Nj := ^v{Nv-l) ^ 

where :— signifies that the left-hand side is defined as the right-hand side. We define Ni as the 
number of shortest paths in G. Naturally, Ni here takes this value because G is undirected. For 
a directed network, Nj would be Nv{Nv — 1). For convenience, we will interchangeably use the 
following two sets of indices to label the elements of W, 

>V(G) = . . .,Wij, . . .,WNv-l,Nv} = {Wl, ...,We,.. ■,WNe}- (1) 

Albeit we will here restrict our attention to undirected graphs, an extension of our proposed 
technique to directed networks will be discussed in the conclusion. 

A range of topological metrics necessitating the computation of the shortest path matrix have 
been proposed in the literature. Two popular choices of topological measures are the global and 
local efficiency measures introduced by Latora and Marchiori (2001). Both of these quantities can 
be derived from the general definition of the efficiency, E{-), of a simple graph G = {V,£,W), 
which is defined as follows, 

. Nv 

where summation over i < j implies the consideration of all the elements of the following set, 
{i < j : i, j = 1, . . . , Nv}, and with dij denoting the length of the shortest path between vertices 
Vi and Vj in G. According to Latora and Marchiori (2001), the global and local efficiencies of an 
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unweighted undirected graph are respectively defined as follows, 

-, Nv 

:= and := — V £(G,), (3) 

where Gi C G for every i = 1, . . . , Ny, such that each G,; is the subgraph of all the neighbors of 
the i^^ node. That is, V{Gi) = {vj € Gi : vjVi G SiG)}. 

The computation of the efficiency or any other topological function of G, which we will denote 
by T{G) is ill-defined for a weighted graph G = (V, f,W). In such a case, one may resort to 
integrating the topological measure of interest with respect to all the possible densitys of the 
graph under scrutiny, where the density of an unweighted undirected graph is defined as follows, 

^(G) - • (4) 

Now, integrating a topological function with respect to the different densitys of G can be expressed 
as 

T{G):= [ T{-i{G,k))dk, (5) 

where the function 'y{G,k) in equation (5) is a density-thresholding function, which takes a 
weighted network as well as a specific level of density and returns an unweighted network with 
density k. Here, density is trc^atcd as a discrete random variable, K, with realizations in lower 
case. As K is discrete, it only takes a countably finite number of values, which is the following 
set 

where \^Ik\ = Nj. It will also be useful to label the elements of k with the following indices 
t = 1, . . . , Nj. Therefore, equation (5) can be re- written as follows. 



1 ^' 

T{G) = —Y,T{l{G,kt)). (7) 

If the topological metric of interest involves the computation of the matrix of shortest paths, D 
for every thresholding of G, such an integration would necessitate invoking Dijstra's algorithm or an 
equivalent method Nj times. In the next section, we propose an alternative to this computationally 
expensive integration by directly updating D instead of updating the underlying adjacency matrix, 
A, for every new density. 



Recursive Shortest Path Algorithm for Density-integration 

Our strategy for bypassing the need to invoke a shortest path algorithm at every density level 
consists of three stages: (i) we compute the ranks of the entries of the weight matrix, (ii) we 

update the shortest path matrix by successively adding edges in the order corresponding to the 
ranks obtained in the first stage, and finally (iii) wc collect the values of the topological metric of 
interest for every shortest path matrix and return the mean value of that topological metric. We 
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describe these three stages, in turn. 

Firstly, we compute the ranks for the weighted network of interest G = (V, £, W) as follows, 

Nv 

-Ru(W):=I^^{w'u>w'«4, (8) 

where /{•} is the indicator function returning 1 if the argmnent is true and otherwise. We will 
assume that are no ties in the values of W. In practice, the presence of ties can be resolved by 
randomization. 

Secondly, we extract each edge in the order provided by the ranks. That is, running over the 
ranks kt, where t = 1,. . . ,Nj, we have the following Nj ordered pairs: 

{vi,V2}t ■■= argmax/{Py(W) = kt}. (9) 

{i>j} 

It then suffices to update D using each of these pairs recursively, as follows, 

D( = edgeUpdate (Dt_i, {vi,V2}t) ■ (10) 

For each Dt, we can now collect the topological measure based on this particular shortest path 
matrix, T{Dt). Finally, is then remains to compute the mean value of these collected topological 
measures in order to obtain the desired density-integrated metric of the graph of interest. That 
is, 

1 ^' 

T{G) = —Y,Tint). (11) 

^ t=i 

The difficulty of this method centres on the use of the edgeUpdate function in equation (10). 
This algorithm proceeds as follows. At each step t, we ask what the impact of the addition of a 
new edge to an existing graph is in terms of shortest path relationships. Our algorithm answers 
this question by two successive breadth-first searches (BFS) around the vertices incident to the 
edge added at each t. Firstly, we conduct a BFS around V2 and check whether the shortest 
path between vi and each of the m*'^ degree neighbors of V2 are shortened by the addition of 
a new edge between Vi and V2- Secondly, we conduct a BFS centred at vi, where we check 
if the shortest paths between all the neighbors of V2, which were modified in the first stage 
and the m*^ degree neighbors of vi are shortened by the introduction of the new edge. The 
full edge updating algorithm of is described in pseudocode in Figure 1. For simplicity, we 
represent the algorithm when each Dt is coded as a full matrix. However, a list representation 
can also be adopted to minimise storage space. Moreover, we have also provided a graphical 
description of our edge updating algorithm for density-integration in Figure 2. A C-|--|- version 
of this algorithm is freely available as part of the NetworkAnalysis package on the R platform 
(http: / / cran.r-project.org/package=NetworkAnalysis) . 
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Algorithmic Analysis 

When storing the graph of interest as an adjacency matrix, Dijkstra's algorithm has efficiency in 
0(|F|^). Since density-integration would require invoking that algorithm Nj = Nv{Nv — l)/2 
times, the efficiency would, in that case, be in 0(|y|^). If coding the graph as a matrix, our 
proposed algorithm does not perform better than a combination of Dijsktra's algorithm. As the 
efficiency of a BFS is 0(|Fp) and wc perform Nj such searches, it follows that in the worst-case 
scenario, the efficiency of our proposed method would also be 0(|y|^). However, if the graph 
of interest is coded as a list, each BFS is in 0{\E\ + \V\), and therefore the entire recursive 
shortest-path algorithm has an efficiency of 0(|Fp|£'| -|- \V\^). By contrast, a combination of 
Dijstra's algorithms based on an adjacency list only reduces to 0(|y|^|i^| log \V\) or 0(|y|^|£^| + 
IVI^logiyi) using the Fibonacci heap. Thus, our algorithm outperforms a combination of Nj 
Dijkstra's algorithms when the graph of interest is coded as a list. 

Conclusion 

In this paper, we have described a recursive shortest path algorithm for weighted graphs, which 
can be used for the integrating topological metrics with respect to density. This proposed method 

can readily be generalized to directed networks. In such a case, one simply needs to define a 
graph G = {V,£,W), where the elements of £{G) are ordered pairs of vertices. The edgeUpdate 
function in equation (10) can then be modified in order to check for directed shortest paths instead 
of undirected ones. Given the growing interest of natural scientists in graph topological properties 
and the large availability of weighted networks, the utilization of algorithms of the type described 
in this paper is likely to become ubiquitous. 
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Edge Updating of D 

## Inputs : D , {vi,V2} ■ 
## Output: D. 



1 ### Initialization: 

2 Set Nv = D.ncol(); 

3 dviV2 — dv2V\ — Ij 

4 

5 ### BFS around V2. 

6 Set Sg = vi\J V2, 5'"' = V2 ; 

7 FOR (m = 1, . . . , iV„ - 2) DO 

8 A = U„6S(— 1) W^g; 

9 FOR (u G A) DO 

10 IF (d„i„ > m + 1) 

11 dviv = dvvi =m + l; Add u to S'''"-* ; Add u to Sg; 

12 END IF; 

13 END FOR; 

14 IF S'-"'^ = BREAK; 

15 END FOR; 
16 

17 ### BFS around : 

18 Set Sg = SgM, 5*"' = -ui; 

19 FOR (m= l,...,A'„-2) DO 

20 A = u„gs(™-i) ^»/5g; 

21 FOR {v € A) DO 

22 FOR (m G Sg) DO 

23 IF {dvu > dviu+ m) 

24 d„u = duvi +m; dvu = dy^u +m; Add v to S^"* 

25 END IF; 

26 END FOR; 

27 END FOR; 

28 IF S'f"') = BREAK; 

29 END FOR; 
30 

31 Return D; 



Figure 1. Updating of D inserting one edge at a time, here denoted v\V2- The set So is the set of visited 
vertices, whereas <S''^'"''s are the sets of unvisited edge corresponding to the m**" degree neighborhoods of 

the previously modifiod vortices, and A is the set of relevant vertices at every level of the BFS. Both A, 
So and the ^'''"■''s should be regarded as containers, where adding implies inserting a new element in a 
set. 
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a) Update: New edge between 114 and 115. 





D 



Vl 


112 


V3 


V4, 


^5 


V6 


V7 




1 


3 


2 


00 


00 


00 


1 




2 


1 


00 


00 


00 


3 


2 




1 


00 


00 


00 


2 


1 


1 




00 


00 


00 


00 


00 


00 


00 




1 


2 


00 


00 


00 


00 


1 




1 


00 


00 


00 


00 


2 


1 





b) Phase I: Breadth- first search around 115. 
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c) Phase II: Breadth- first search around V4. 




D 



Vl V2 Vs V4 t)5 Vq Vy 



Figure 2. Graphical representation of the edge updating algorithm to modify the shortest path matrix, 
D, one edge at a time. In panel (a), a new edge, V4V5, is added to an existing graph, which is otherwise 
composed of two disconnected components. In panel (b), we conduct a BPS around with respect to V4, 
updating D accordingly with the new shortest paths between V4 and 115 and its first and second degree 
neighbors represented in red, yellow and purple, respectively. In panel (c), we conduct a BPS around V4 
with respect to the vertices, which were modified in phase I of edgeUpdate, denoted in blue. The first 
and second degree neighbors of V4 are here denoted in orange and purple, respectively. In each panel, 
the corresponding modifications in the matrix of shortest paths are reported on the right-hand side. The 
presence of a dashed line between two vertices indicates that we test whether the inclusion of V4V5 shortens 
the shortest path between these two vertices. 
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